Efficient Use of Differentially Private Binary Trees
نویسنده
چکیده
Binary trees can be made differentially private by adding noise to every node and leaf. In such form they allow multifaceted exploration of a variable without revealing any individual information. While a differentially private binary tree can be used and read just like its conventional exact-valued analog, realizing that different combinations of nodes contain overlapping answers to the same information allows us to bring the statistical properties of multiple measurements under measurement error to noisy binary trees to create statistically efficient node estimates. We construct estimators that correctly use all available information in the tree, thus decreasing the error of nodes by up to eighty percent for the same level of privacy protection. Differentially private binary trees are important summary statistics for a broad variety of uses and algorithms. They are central to the algorithm of Dwork et al (2010) for releasing private streaming data, and used in numerous adaptations of this problem, such as Chan et al (2012), Cao et al (2013), and Thakurta and Smith (2013). A binary tree can be used to compose probability and cumulative densities of variables, range queries, as well as means, medians, modes and variances by Monte Carlo integration. Thus the release of a private binary tree can be a broadly useful privacy preserving means of allowing exploration of a variable, as for example in the statistical release of a non-interactive curator (Dwork and Smith, 2009). Due to the importance of binary trees, several heuristic adaptations of their use have been studied that bring about some improved accuracy for the same level of privacy guarantee (Hay et al 2010, Xu et al 2012, and relatedly Xiao et al 2010). We show here how to derive an optimally efficient use of a private binary tree, in the precise sense of providing minimum variance unbiased estimates. That is, we show how to refine a private tree in a manner that makes full and optimal use of all the information the tree contains. This is accomplished by linking the tree structure to the known statistical properties of multiple measures under measurement error. 1 Problem Statements Consider a perfect binary tree, in which every node, ti, is the sum of all leaves below that node, plus a random draw, i from some fixed distribution f(.), constructed to guarantee differential privacy. As represented below, the true values are denoted a through h at the leaves, the differentially private value revealed at any node is given to the right, and the notation for index i both numbers the nodes sequentially, and also describes the path from the top to reach that particular node, as a sequence of left (0) and right (1) progressions.
منابع مشابه
Differentially Private Projected Histograms: Construction and Use for Prediction
Privacy concerns are among the major barriers to efficient secondary use of information and data on humans. Differential privacy is a relatively recent measure that has received much attention in machine learning as it quantifies individual risk using a strong cryptographically motivated notion of privacy. At the core of differential privacy lies the concept of information dissemination through...
متن کاملDifferentially-Private Learning of Low Dimensional Manifolds
In this paper, we study the problem of differentially-private learning of low dimensional manifolds embedded in high dimensional spaces. The problems one faces in learning in high dimensional spaces are compounded in differentially-private learning. We achieve the dual goals of learning the manifold while maintaining the privacy of the dataset by constructing a differentially-private data struc...
متن کاملA New Heuristic Algorithm for Drawing Binary Trees within Arbitrary Polygons Based on Center of Gravity
Graphs have enormous usage in software engineering, network and electrical engineering. In fact graphs drawing is a geometrically representation of information. Among graphs, trees are concentrated because of their ability in hierarchical extension as well as processing VLSI circuit. Many algorithms have been proposed for drawing binary trees within polygons. However these algorithms generate b...
متن کاملDifferentially- and non-differentially-private random decision trees
We consider supervised learning with random decision trees, where the tree construction is completely random. The method was used as a heuristic working well in practice despite the simplicity of the setting, but with almost no theoretical guarantees. The goal of this paper is to shed new light on the entire paradigm. We provide strong theoretical guarantees regarding learning with random decis...
متن کاملProbabilistic analysis of the asymmetric digital search trees
In this paper, by applying three functional operators the previous results on the (Poisson) variance of the external profile in digital search trees will be improved. We study the profile built over $n$ binary strings generated by a memoryless source with unequal probabilities of symbols and use a combinatorial approach for studying the Poissonized variance, since the probability distribution o...
متن کامل